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SUMMARY 

A  data-based  procedure  is  introduced  for  local  bandwidth  selection  for  kernel  estimation  of  a 
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practical  advantage  that  it  reduces  the  need  for  a  priori  values  and  does  not  require  pilot  estimates  of 
the  regression  function,  optimization  of  estimated  objective  functions  or  resampling.  A  small  Monte 
Carlo  study  is  used  to  examine  the  behavior  of  the  new  bandwidth  estimator  in  a  variety  of  situations. 
The  resulting  finite-sample  mean  square  errors  of  the  corresponding  curve  estimates  are  generally  found 
to  be  less  than  or  equal  to  those  of  an  idealized  plug-in  estimator. 
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1.  INTRODUCTION 

Bandwidth  selection  occupies  an  important  role  in  the  literature  of  nonparametric  regression 
(cf.  Marron,  1989,  or  Eubank,  1988,  for  references).  With  few  exceptions,  the  primary  emphasis  of  this 
work  has  been  on  the  selection  of  globally  optimal  bandwidths.  However,  it  is  known  (see,  e.g.,  Muller, 
1988,  and  Staniswalis,  1989)  that  gains  in  estimator  performance  can  be  realized  by  optimizing  the 
bandwidth  locally  rather  than  on  a  global  basis.  Thus,  in  this  paper  we  present  a  simple,  effective 
method  for  selecting  local  bandwidths  in  kernel  regression. 

Consider  the  nonparametric  regression  model  where  responses  ylv..,  Yn  are  observed  following 
the  model 

y,  =  m(t,)  +  i  =  l,...,n.  (1.1) 

Here  the  cf  are  independent,  identically  distributed,  random  variables  with  zero  mean  and  finite 
variance  <r2,  the  ti  satisfy  0  <  <  •  •  •  <  tn  <  1  and  m  is  an  unknown  function.  Without  having  to 
assume  more  about  m  than  certain  smoothness  conditions,  we  wish  to  estimate  m  at  some  fixed 
argument  t. 

There  are  many  good  estimators  for  m(<).  Examples  of  these  can  be  found  in  Eubank(1988) 
and  Miil)er(1988).  In  particular,  the  Priestley-Chao  kernel  estimator  of  m  at  /  is 

*»(0  =  i  t(*i  Alx)  ^  0.2) 

where  A"  is  a  kernel  function,  h  >  0  is  the  bandwidth  or  smoothing  parameter  and  t0  =  9.  The  kernel  K 
is  assumed  to  be  continuously  differentiable,  symmetric  with  support  on  '-1,  l]  and  of  order  p  in  the 
sense  that 

r  i.  j  =  o, 

J!,Sh'(z)  dz=  <  0,  j  =  1,  .  .  .  ,  p-  1, 

l  kp  jz  0,  ;  =  p. 


(1.3) 
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To  use  mh  in  practice  one  requires  choices  for  both  A  and  K.  Discussions  of  methods  for 
selecting  K  can  be  found  in  Muller(1988).  We  will  concentrate  here  on  the  problem  of  selecting  h.  The 
value  used  for  h  will  be  allowed  to  depend  on  the  point  of  estimation  t.  Our  goal  is  to  find  a  good 
choice  of  h  for  each  value  of  t  in  the  sense  of  making  the  mean  squared  error  (mse)  of  estimation  as 
small  as  possible. 

There  are  several  data  adaptive  local  bandwidth  selection  techniques  that  have  been  proposed 
in  the  literature.  Modifications  of  squared-error  cross  validation  for  consistent  estimation  of  optimal 
local  smoothing  have  been  introduced  by  Hall  and  Schucany  (1989)  and  Vieu  (1990).  An  alternative 
resampling  approach  that  uses  the  bootstrap  to  estimate  the  mse  of  mh(f)  is  described  by  Hardle  and 
Bowman  (1988).  Two  other  approaches  to  estimating  the  mse  that  use  pilot  estimates  of  m(t)  have 
been  studied  by  Muller  (1985)  and  Staniswalis  (1989).  All  of  these  algorithms  involve  a  search  for  a 
local  minimum  of  an  estimated  mse  and  require  the  specification  of  some  other  tuning  parameter,  e.g., 
a  global  bandwidth  for  a  pilot  estimate  of  m.  In  contrast,  the  technique  that  we  propose  essentially 
does  not  require  such  initial  values  and  there  is  no  search  required  for  the  minima  of  a  cross-validation 
or  estimated  mse  function. 

Our  approach  to  local  bandwidth  selection  stems  from  some  simple  asymptotic  analysis.  Let 
Var(h)  =  Var(mk(l))  and  Bias(h)  =  Emh(i)  -  m(t).  Then,  standard  Taylor  expansions  reveal  that  if  m 
€  £^[0,  1],  the  mean  squared  error  mse[rhA(<)]  =  E[mh(t)  -  m(<))2  can  be  written  as 
mse[nih(t)]  =  Var(h)  +  Bias7(h) 

=  ^T  +  [' hrkrm(r)(t)/V'. ]J  +  o(i)  +  o(h”)  ,  (1.4) 

where  Q  =  and  kp  =  flIzpK(z)dz.  Minimization  of  (1.4)  with  respect  to  A  yields 

2pn(kpm{p)(t)/pl)i 


1/(2P  +  1) 


(1.5) 
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if  we  ignore  higher  order  terms.  By  substituting  (1.5)  into  (1.4)  we  then  obtain 

Far{A?)  =  2pBias7(h*),  (1.6) 

again  neglecting  higher  order  terms.  More  general  results  for  integrated  mse  and  derivative  estimation 
can  be  found  in  Gasser,  Muller,  Kohler,  Molinari  and  Prader  (1984)  and  Muller(1988). 

The  basic  proposal  here  is  to  capitalize  on  the  balance  between  variance  and  bias  present  in 
(1.6).  We  first  estimate  both  the  variance  and  bias  over  a  grid  of  fixed  A  values.  For  large  n  we  should 
have  for  any  fixed  A  that  Var(h )  ~  A/nh  and  Bias7{h)  ~  BAJp  for  constants  A  and  B.  Thus,  given 
several  estimated  values  of  the  variance  and  bias  one  can  obtain  estimates  A  and  B  of  A  and  B  (e.g., 
by  least  squares)  and  then  solve  (1.6)  to  find  the  adaptive  bandwidth  choice 

i,  =  U,1  .  (1.7) 

‘  [2  pnB 

In  Section  3  we  will  show  that  A(  is  consistent  and  asymptotically  normal  as  an  estimator  of  A *  and 
attains  the  same  convergence  rate  as  plug-in  estimators. 

The  remainder  of  the  paper  is  organized  as  follows.  In  the  next  section  we  provide  details 
concerning  the  computation  of  our  bandwidth  estimator.  Then,  in  Section  3,  asymptotic  properties  of 
A,  are  described.  The  findings  of  a  small  simulation  experiment  are  presented  in  Section  4.  Finally, 
some  concluding  remarks  are  collected  in  Section  5. 
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2.  ADAPTIVE  BANDWIDTHS 

Io  this  section  we  give  a  detailed  description  of  our  method  for  local  bandwidth  selection. 
Throughout  the  remainder  of  the  paper  we  assume  that  the  design  is  equally  spaced,  i.e.,  <,  =  »/n.  It 
should  be  emphasized  however  that  this  is  merely  for  simplicity  and  the  approach  extends  directly  to 
more  general  designs. 

Two  essential  ingredients  of  the  proposed  method  are  estimators  of  Var(k)  and  Btas(h).  The 
exact  variance  of  mh(i)  is 


Var(h)  =  <r2  £  4,  *2(^).  (2.1) 

i-i  n  A  v  «  / 

Thus,  to  estimate  the  variance  we  need  only  estimate  <x2.  For  this  purpose  we  use  the  estimator 

1  71- 1 

proposed  by  Gasser,  Sroka  and  Jennen-Steinmetz(1986)  which  has  the  form  a2  —  =4  E  [V.  ,  -  2  Y- 

1=2  ’■*  1 

+  T|+1]2/ 6  for  an  equispaced  design.  Consequently,  Var(h)  can  be  easily  estimated  for  any  given  value 
of  h  by  replacing  a2  by  <r2  in  (2.1).  We  denote  the  result  by  Var(h). 

It  remains  to  estimate  Bias(h)  for  which  purpose  we  use 

Bias(k)  =  |  t\  *G(V)  Y,  .  (2-2) 

i=i  " 

where  K G(z)  =  K(z)  -  A'p+2(2)  an<^  ^>42  *s  any  ( P  +  2)th  order  kernel.  In  Section  A  we  sjjecialize  to  p 
=  2  and  use  the  4th  order  kernel  studied  by  Schucany(1989):  K4(z)  =  [A'(r)  -  cJA'(cr)]/(l-c2)  ,  with  c 
=  .671  and  K(z)  =  3(1  -  z 2),  |z|  <  1. 

A  heuristic  motivation  for  the  use  of  (2.2)  as  an  estimator  of  Btas(h)  can  be  derived  from  the 
asymptotic  form  (1.4).  For  large  n  and  a  kernel  of  order  v  we  will  have  Em^t)  ~  m(t)  + 
hv kvnx  .  Taking  the  difference  of  such  asymptotic  expressions  for  v  =  p  and  p  +  2  ’eaves  the 
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lead  term  kp krmr\i)/p\  as  required.  In  actuality  m  need  not  have  p  +  2  derivatives  for  Bias  to  be 
effective  but  requires  only  slightly  more  smoothness  than  membership  in  C*[0,  1]  (see  Section  3  for 
more  details).  Our  approach  is  closely  related  to  twicing  for  estimation  of  bias  (Stuetzle  and  Mittal, 
1979)  and  can,  in  fact,  be  shown  to  include  twicing  as  a  special  case  in  an  asymptotic  sense. 

To  obtain  the  estimators  A  and  B  in  (1.7)  we  evaluate  (2.1)  and  (2.2)  over  a  grid  of 
predetermined  bandwidths,  ht  ,...,  h t.  In  practice  the  number  in  this  grid  may  be  reasonably  small  and 
we  have  used  k  =  7  successfully.  Thus,  at  a  fixed  <  one  obtains  k  estimates  from  (2.1),  which  we  denote 
by  (t>!  ,. ..,  vk)  =  (V'ar(Aj)  V'ar(At))  and,  by  squaring  the  values  from  (2.2),  (6j  b\)  = 

(fiias^Aj)  ,...,  Bias2(hk)).  These  two  sets  of  estimators  are  then  fit  to  their  simple  asymptotic 
expressions  as  functions  of  A,  namely  A/nh  and  Bh4,  via  ordinary  least  squares.  The  resulting  estimates 
of  A  and  B  are 


and 

b  =  (t  w)(i  »;')■' .  (2.4) 


By  substituting  (2.3)  and  (2.4)  into  (1.7)  we  obtain  our  estimator  A,  of  A*. 

Figure  1  illustrates  the  idea.  It  displays  fits  to  the  Vj  and  Ay  for  a  specific  simulated  example 
with  p  =  2,  m(<)  =  stn(l)  at  t  =  .50,  <t  =  .015  and  n  =  100.  Actually  the  Bias 2  curve  is  multiplied 
by  4  so  that  the  intersection  of  the  two  curves  occurs  at  the  desired  value,  A(,  given  by  (1.7).  The 
maximum  bandwidth  in  the  grid  of  values  has  been  chosen  to  be  the  largest  permissable  without 
encountering  boundary  bias.  This  implies  mazA-  =  ,67Imax{<  -  t{,  <„  -  /}.  The  minimum  bandwidth 
is  large  enough  for  a  sufficient  number  of  points  to  be  in  the  window.  We  have  set  mm  A-  =  6min(l  - 

I  1  I  » 


0.00018 


h 


Figure  1. 

Least  Squares  Fits  to  Estimated  Variance  and  Bias2  Values 
for  a  Grid  of  Fixed  Bandwidth®. 
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This  lower  endpoint  is  not  critical  to  the  stability  of  the  algorithm;  the  variance  estimates  in  (2.1) 
use  all  of  the  data  regardless  of  the  magnitude  of  h.  The  bias  estimates  do  become  erratic  when  too 
few  points  get  nonzero  weights,  but  the  curve  Bh 4  is  forced  through  the  origin  and  thus  an  errant 
positive  value  at  one  small  h  has  no  noticeable  impact  on  the  fit.  Design  considerations  for  efficient 
estimation  of  A  and  B  produce  two  designs  that  are  skewed  in  opposite  directions.  To  balance  these 
and  have  stable  estimates  a  reasonable  compromise  appears  to  be  equally  spaced  values  of  h-.  On  the 
other  hand,  estimating  B  is  more  difficult  than  estimating  A.  Consequently,  a  grid  of  values  more 
concentrated  toward  the  right  might  prove  beneficial,  although  we  have  not  experimented  with  this  to 
any  great  extent. 


3.  ASYMPTOTIC  PROPERTIES 


In  this  section  we  state  and  prove  our  principal  asymptotic  results.  Recall  that  the  data  values 
are  equally  spaced  and  the  errors  are  independent  with  common  variance  r2.  We  will  require  an 
assumption  of  Lipshitz  continuity  for  mP\  By  this  we  mean  the  following:  a  function  /  is  said  to  be 
Lipshitz  continuous  of  order  0  <  7  <  1  if  there  exists  a  finite  constant  M  such  that  sup  |/(s)  -  J[t)\  < 
M]s  -  t\\ 


Theorem.  Assume  that  rnp'1  is  Lipshitz  continuous  of  order  0  <  7  <  1  and  the  satisify  ~Ctn  a 
for  0  <  a  <  1  and  0  <  (7,  <  •  <  Ct  <  00.  Then  if  l/(2(p  +  7)  +  1)  <  a  <  1/2, 
n(i-(2P+i)a)/2^/A*  _  i,  AT(0,  402/fl(2p+l)2)  ,  where  9 2 

k  k  r  1  k 

•TEiv.f-1  KG(u/Ci)KG(u/Cj)du/(^^C^p)7  and  B  is  the  coefficient  for  the  dominant 

*  =  U=i  J  fet 

term  in  the  squared  bias  that  is  estimated  by  (2.5). 


The  Theorem  states,  among  other  things,  that  hjh *  -  1  =  Op(  n  (1  (2P  +  1)a)/2)  Thus,  h ,/h* 


8 


converges  to  1  in  probability  provided  that  a  <  l/(2p  +  1)  .  The  rate  of  convergence  is  quite  slow  for 
a  close  to  1/(2 p  +  1)  but  can  be  much  faster  if  7  is  large  and  a  is  selected  to  be  email.  These  rates 
may  be  inherent  to  the  local  bandwidth  selection  problem  (cf.  Staniswalis,  1989,  and  the  Corollary 
below). 

It  is  interesting  to  compare  the  rates  given  by  the  Theorem  with  those  for  global  bandwidth 
estimators.  In  the  case  of  global  bandwidth  estimates  obtained  by  cross-validation,  it  is  known  that 
when  p  =  2  the  ratio  of  the  bandwidth  estimate  to  the  optimal  bandwidth  will  converge  to  one  at  the 
rate  n  1/10,  regardless  of  the  value  of  7  (cf.  Hardle,  Hall  and  Marron,  1988,  and  Park  and  Marron, 
1990).  In  contrast,  by  choosing  a  appropriately,  we  can  always  make  hjh*  converge  faster  than 
n  1/10  if  7  is  sufficiently  close  to  one. 

Of  course  the  Theorem  establishes  much  more  than  just  rates  of  convergence.  The  asymptotic 
normality  result  may  be  useful  for  constructing  confidence  intervals  for  h*.  Discussion  of  this  point  in  a 
related  setting  can  be  found  in  Hardle,  Hall  and  Marron(  1988). 


The  conditions  on  the  range  of  a  restrict  the  A,  from  being  either  too  small  ot  too  large.  The 
lower  bound  is  needed  to  insure  that  n(1  +  .  gj  ;s  asymptotically  normal.  This  property  of 

B  is  what  drives  the  proof  of  the  Theorem.  An  essentially  identical  condition  is  needed  by 
Staniswalis(  1989)  for  the  bandwidth  of  a  pilot  estimator  of  rnP *  that  was  used  to  construct  her 


estimator  of  the  optimal  local  bandwidth. 


Proof  of  Theorem.  To  establish  the  Theorem  two  lemmas  are  required  that  we  now  state  and  prove. 


Lemma  1.  nhVar(h)  A  =  Op((nk)  l)  +  Op(n"1/2)  if  nh  — *  00  as  n  —  00. 


Proof.  The  proof  follows  from  standard  arguments  (cf.  Eubank,  1988,  Lemma  4.1)  and  the  fact  that  <r2 
-  <7 2  =  0 p{n  1/2)  (cf.  Gasser,  et  al,  1986). 


□ 
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Lemma  2.  Let  B  =  D 3,  Dh.  =  Bias^h^/h*  and  set  D  =(X>*1  ,...,  ^hk)'  Then,  if  m'p ^  is  Lipshitz 

continuous  of  order  7,  A,  ~  Qn-0  for  0  <  Cx  <  •••  <  Ck  <  00  and  a  >  1/(2 (p  +  7)  +  1)  , 

»('  .  £>ij  _i  Aft(0,  <t2£)  with  1  a  A-vector  of  all  unit  elements  and  £  a  AxA  matrix 

1 

having  typical  element  cr,y  =  f  A'<;('t</Cj)A’G(u/C;)du/(ClC^)P+1. 


Proof.  Using  Theorem  4.2  of  Muller(1988)  and  standard  arguments  one  can  show  that 
n(1  ^2p+1)o)/2a'(Z>  -  ED]  converges  in  distribution  to  a  zero  mean  normal  random  variable  with 
variance  a'£a  for  any  vector  a  such  that  this  quantity  is  nonzero.  Since  ED  -  Dl  is  0( n  as  a 

consequence  of  our  assumptions  about  m,  the  lemma  now  follows  from  our  conditions  on  a  and  the 
definition  of  asymptotic  normality  (cf.  Serfling,  1980,  pg.  21).  □ 

We  are  now  ready  to  establish  the  Theorem.  Our  estimator  of  A  is  A  = 

n(^  Uar(A;)/Cj]/7i“^C~2  =  A  +  Op( n°  1/2),  by  Lemma  1  and  our  assumptions  on  the  A;.  Also. 
;=1  ;=1 

from  Lemma  2  and  results  on  transformations  of  asymptotically  normal  random  variables  we  have 

n(i-<2P  +  i)o;/2^  -  B]  t  AT(0,  4 Be2). 


The  Theorem  can  now  be  established  by  writing 


n(l-(2P  +  l)a)/2{  Kjh.  _  1}  =  +  -  1} 

=  n(1_(2p+1)o)/2{[(l  +  Op(n~in))l(\  +  (B  -  B)/B)]‘/(Jp+1)  -  1} 

=  -  „(1-<2p+1>°>/2(s  .  B)/B(2p  +  1)  +  Op{na-'/7)  +  Op(n~(7p~i)a/2). 


Applying  our  asymptotic  normality  result  for  B  completes  the  proof. 


□ 


As  a  final  comment  we  note  that  similar  techniques  to  those  used  in  proving  the  Theorem  can 
be  used  to  establish  rates  of  convergence  for  a  plug-in  estimator  of  A*  such  as  htPl  — 
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[& 7 Q/ nkpm\r\i)7]i^2r*l\  where  m[p\t)  is  a  kernel  estimator  of  m'p\t)  with  bandwidth  b  and  kernel 
Kq.  Specifically,  we  have  the  following. 

Corollary.  Assume  that  m'P^  is  Lipehitz  continuous  of  order  0  <  7  <  1  and  that  b  satisifies  6  ~  n~a 

1 

for  1/(2Q>  +  7)  +  1)  <  <*<1/2.  Then,  n(l~(,p+l)a)/\htPI/h *  -  1]  -1  N( 0,  4 a3 f  K2D(u)du/m(p\t)2). 

-1 

4.  FINITE  SAMPLE  PERFORMANCE 


To  demonstrate  the  implementation  and  investigate  the  stability  of  the  algorithm  for  h( 
several  example  problems  were  generated  on  the  IBM  3081 D  computer  at  Southern  Methodist 
University.  A  Fortran  program  using  IMSL  subroutines  computed  observations  yi,...,Fn  for  n  =  50, 
100,  200,  400  and  1000  from  (1.1)  with  m(t)  =  s»n(l).  The  disturbances  were  obtained  by  generating 
standard  normal  random  deviates  that  were  then  rescaled  to  have  standard  deviations  a  =  .005  and 
.05.  At  <  =  .5  the  true  value  of  interest  is  m(.5)  =  .479. 

The  estimator  A,  was  computed  for  each  sample  using  k  =  7  and  K4(z)  =  [A'(z)  -  c‘,A'(cr)]/(l- 
c2)  ,  with  c  =  .671  and  K(z)  =  jj(l  -  z2),  | z]  <  1.  Also  computed  were  two  “competing'’  bandwidths: 
namely,  the  true  asymptotically  optimal  bandwidth  h *  from  (1.5)  and  an  optimal  plug-in  type 
estimator,  b(PJ.  The  estimator  htpj  is  obtained  by  using  the  estimator  a2  in  place  of  a2  and  a  kernel 
estimator  for  m"(<)  in  (1.5).  Although  this  estimator  is  data  dependent,  the  bandwidth,  b ,  that  is  used 
for  »hj'(/)  has  been  set  at  its  asymptotically  optimal  value.  More  specifically,  m'/(<) 
=  H  H  73  K*(ir)  ^;>  w^ere  K*{2)  —  105(-5/  +  6 z2  -  1 )/ 16,  the  optimal  kernel  of  order  (2,  4)  from 
Gasser,  Muller  and  Mammitsch  (1985),  and  b  -  I  b<r2  Q* /4n(k^m  ^  (<)/4!)2  I  with  Q*  =  A'^(z)dz 
and  k*  =  $‘,z*K,(z)dz. 

Average  values  of  the  estimated  bandwidths,  ht  and  k(Pl,  along  with  their  Monte  Carlo  standard 
deviations  were  calculated  for  M  =  1000  replications.  Table  1  summarizes  the  results.  Examination  of 


Table  1 

Summary  of  Average  Bandwidths  over  M  =  1000 
Monte  Carlo  Repetitions  (Standard  Deviations  in  parentheses) 
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a 

n 

Asymptotically 

Optimal,  h  * 

Adaptive 
Choice,  ht 

Ideal 

Plug-In,  ktp! 

Correlation 

r(Aj,  ktpj) 

.005 

50 

.1267 

.1258 

.1206 

.88 

(.0080) 

(.0095) 

100 

.1103 

.1104 

.1095 

.87 

(.0048) 

(.0062) 

200 

.0960 

.0962 

.0967 

.85 

(.0029) 

(.0037) 

400 

.0836 

.0839 

.0838 

.87 

(.0018) 

(.0022) 

1000 

.0696 

.0699 

.0698 

.86 

(.0009) 

(.0011) 

.05 

50 

.3182 

.2914 

.2719 

.67 

(.035) 

(.049) 

100 

.2770 

.2772 

.2715 

.67 

(.037) 

(.053) 

200 

.2412 

.2467 

.2551 

.66 

(.032) 

(.054) 

400 

.2099 

.2125 

.2220 

.70 

(.020) 

(.046) 

1000 

.1748 

.1764 

.1815 

.71 

(.010) 

(.028) 

12 


the  v&lues  in  the  table  reveals  that  the  proposed  bandwidth  estimator  is  comparable  to  the  ideal  plug¬ 
in  estimator  for  estimation  of  h*  but  tends  to  be  more  stable.  The  greater  variability  of  hfpj  is 
evidently  due  to  estimating  m"(l)  using  only  a  single  bandwidth.  Even  though  b  is  set  at  the  “ideal” 
asymptotically  optimal  value  for  A,  appears  to  gain  stability  from  the  smoothing  afforded  by 

the  fit  across  several  bandwidths  used  for  B  and  A. 

The  rate  of  convergence  of  h(,  as  measured  by  the  decrease  in  its  standard  deviation  as  n 
increases,  appears  to  be  better  than  that  predicted  for  cross  validation.  The  asymptotics  begin  to  take 
effect  more  slowly  when  the  noise  is  greater.  In  other  words,  at  a  —  .05,  the  standard  deviation  of  ht 
does  not  begin  its  characteristic  decline  until  after  n  —  200.  This  delay  is  still  more  pronounced  for  the 
plug-in  estimator. 

The  final  column  of  the  table  contains  the  Monte  Carlo  correlation  coefficient  between  A,  and 
A  tFJ  over  the  1000  pairs  of  estimated  bandwidths.  The  strong  correlation  should  not  be  surprising  since 
both  techniques  are  trying  to  estimate  the  same  unknown  ingredients  of  A*  in  (1.5).  The  correlation 
appears  to  be  very  weakly  dependent  upon  n  and  a  decreasing  function  of  a. 

It  is  important  to  note  that  the  primary  interest  is  not  in  estimating  h *.  The  main  objective  is 
to  have  some  practical  method  for  local  bandwidth  selection  that  leads  to  small  finite-sample  mse  for 
Table  2  presents  the  average  of  M  =  1000  squared  errors  for  the  same  example  and  the  three 
bandwidths  covered  in  Table  1.  In  the  low  noise  case  a  =  .005  the  two  estimated  bandwidths  yield 
about  the  same  results.  However,  when  <7  is  larger  the  adaptive  bandwidth  A,  is  5-9%  more  efficient 
than  the  ideal  plug-in  rule.  The  adaptive  rule  is  not  competitive  with  the  fixed  bandwidth,  A*,  but,  of 
course,  such  a  quantity  is  not  available  in  practice. 


It  is  also  of  interest  to  examine  the  rate  of  decay  of  the  sample  mse  sequence  as  a  function  of  n. 
We  expect  this  sequence  to  decay  like  n  if  h*  is  being  estimated  correctly.  Figure  2  displays  the 


mse's  on  a  log-log  scale  for  the  asymptotically  optimal  and  adaptive  bandwidths  when  a  =  .05.  The 
slope  of  —4/5  is  apparent  and  the  relative  efficiency  of  the  adaptive  procedure  is  improving  with 
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Table  2 

Comparison  of  the  Effects  of  Data- Based  Band  widths 


on  Mean-Square  Errors  of  Estimation 

Asymptotically 

Adaptive 

Ideal 

<7 

n 

Optimal,  h* 

Choice,  kt 

Plug-In,  ktp, 

.005 

50 

.0247 

.0467 

.0556 

100 

.0344 

.0196 

.0203 

200 

.0097 

.0101 

.0101 

400 

.0052 

.0056 

.0057 

1000 

.0028 

.0028 

.0029 

.05 

50 

1.2523 

1.4233 

1.5502 

100 

.6675 

.7324 

.7721 

200 

.3716 

.4002 

.4232 

400 

.2150 

.2341 

.2513 

1000 

.1102 

.1165 

.1222 

Entries  are  104  x  ms«  [m(l;  h)]  with  three  different  rules  for  h. 


LOC<N) 


Figure  2. 

Mean  Square  Errors  versus  Sample  Size  for  Asymptotically 
Optimal(«)  and  Adaptive(+)  Bandwidths 
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increasing  sample  size. 

To  further  illustrate  the  utility  of  our  proposed  rule,  another  simulated  data  has  been  used  to 
produce  a  curve  estimate  in  Figure  3.  In  this  example  m(<)  is  estimated  at  each  t  using  m-  (<).  The 
function  being  estimated  is  m(i)  =  4.26(e~3J5<  -  4e-6'5*  +  3e-9-75‘)  which  has  been  used  in 
numerous  studies,  e.g.,  Staniswalis  (1989).  To  eliminate  the  complication  of  the  boundary  effects 
realizations  of  Y-  for  (■  in  the  interval  ( — - 1 ,  2)  are  used  in  producing  the  estimates  in  [0,  1).  The  scale 
on  the  left  is  for  the  Y-  points,  the  solid  line  represents  m(l)  and  the  dashed  line  corresponds  to  the 
estimator  of  m  based  on  our  local  bandwidth  estimates.  Superimposed  on  this  graph  are  curves  for  the 
local  bandwidths.  The  scale  on  the  right  is  for  values  of  Aj  (solid  line)  and  A^  (dotted  line).  The 
estimated  bandwidths  perform  as  one  would  hope  by  increasing  and  decreasing  according  to  the 
curvature  of  m.  The  spikes  in  the  asymptotically  optimal  bandwidths  correspond  to  values  of  m"(i) 
near  zero.  That  the  peaks  in  ht  occur  in  different  places  is  simply  an  indication  that  the  finite  sample 
bias  has  a  (estimated)  minimum  other  than  where  the  dominant  term  vanishes. 

5.  CONCLUSIONS 


In  this  paper  we  have  proposed  a  new  method  for  local  bandwidth  selection.  This  technique  has 
been  shown  to  be  practical  and  perform  well  in  finite  samples.  The  asymptotic  properties  of  the 
bandwidth  estimator  have  been  derived  and  it  was  found  to  be  both  consistent  and  asymptotically 
normal. 

An  important  question  that  is  now  under  study  is  how  to  adapt  our  bandwidth  selection 
technique  for  use  when  i  is  near  the  boundary  region.  More  research  is  also  needed  on  how  to  best 
choose  the  grid  of  bandwidths  used  with  the  algorithm.  Experimentation  with  different  grids  suggests 
that  it  may  improve  the  procedure  to  use  a  larger  minimum  for  the  A-.  A  different  bias  estimator  may 
also  prove  useful  in  this  regard.  The  possibility  exists,  for  example,  of  estimating  B  with  a  generalized 


0.75 


1  00 


;ure  3. 

tive  Local  Bandwidths. 
m(t)  solid  and  m(t;  h^)  dashed; 
uid  ht(dotted);  n  =  100,  o  —  .10 
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or  robust  alternative  to  ordinary  least  squares.  Such  an  approach  has  the  potential  to  alleviate  effects 
produced  by  the  grid  of  bandwidths  being  either  too  small  (so  that  the  bias  estimates  are  very  noisy)  or 
too  large  (causing  extrapolation  beyond  the  range  of  adequacy  of  the  Taylor  approximation).  Although 
these  fine  tuning  issues  remain  open  for  study,  we  believe  that  the  initial  version  of  the  algorithm 
presented  here  is  sufficient  to  demonstrate  its  potential  superiority  over  plug-in  rules. 

To  conclude,  we  note  that  the  basic  procedure  outlined  above  can  be  extended  in  a  number  of 
directions.  For  example,  a  similar  adaptive  approach  may  be  developed  for  local  bandwith  selection  in 
probability  density  estimation.  The  variance  would,  of  course,  need  to  be  estimated  differently  in  this 
setting. 
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